An Evolutionary Random Policy Search Algorithm for Solving Markov Decision Processes

نویسندگان

  • Jiaqiao Hu
  • Michael C. Fu
  • Vahid Reza Ramezani
  • Steven I. Marcus
چکیده

T paper presents a new randomized search method called evolutionary random policy search (ERPS) for solving infinite-horizon discounted-cost Markov-decision-process (MDP) problems. The algorithm is particularly targeted at problems with large or uncountable action spaces. ERPS approaches a given MDP by iteratively dividing it into a sequence of smaller, random, sub-MDP problems based on information obtained from random sampling of the entire action space and local search. Each sub-MDP is then solved approximately by using a variant of the standard policy-improvement technique, where an elite policy is obtained. We show that the sequence of elite policies converges to an optimal policy with probability one. Some numerical studies are carried out to illustrate the algorithm and compare it with existing procedures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Evolutionary Random Search Algorithm for Solving Markov Decision Processes

heterogeneous and dynamic problems of engineering technology and systems for industry and government. ISR is a permanent institute of the University of Maryland, within the Glenn L. Martin Institute of Technology/A. James Clark School of Engineering. It is a National Science Foundation Engineering Research Center. Web site http://www.isr.umd.edu I R INSTITUTE FOR SYSTEMS RESEARCH TECHNICAL RESE...

متن کامل

A Genetic Search In Policy Space For Solving Markov Decision Processes

Markov Decision Processes (MDPs) have been studied extensively in the context of decision making under uncertainty. This paper presents a new methodology for solving MDPs, based on genetic algorithms. In particular, the importance of discounting in the new framework is dealt with and applied to a model problem. Comparison with the policy iteration algorithm from dynamic programming reveals the ...

متن کامل

Solving ‎‎‎Multi-objective Optimal Control Problems of chemical ‎processes ‎using ‎Hybrid ‎Evolutionary ‎Algorithm

Evolutionary algorithms have been recognized to be suitable for extracting approximate solutions of multi-objective problems because of their capability to evolve a set of non-dominated solutions distributed along the Pareto frontier‎. ‎This paper applies an evolutionary optimization scheme‎, ‎inspired by Multi-objective Invasive Weed Optimization (MOIWO) and Non-dominated Sorting (NS) strategi...

متن کامل

A Survey of Some Simulation-based Algorithms for Markov Decision Processes

Many problems modeled by Markov decision processes (MDPs) have very large state and/or action spaces, leading to the well-known curse of dimensionality that makes solution of the resulting models intractable. In other cases, the system of interest is complex enough that it is not feasible to explicitly specify some of the MDP model parameters, but simulated sample paths can be readily generated...

متن کامل

Integrating value functions and policy search for continuous Markov Decision Processes

Value function approaches for Markov decision processes have been used successfully to find optimal policies for a large number of problems. Recent findings have demonstrated that policy search can be used effectively in reinforcement learning when standard value function techniques become overwhelmed by the size and dimensionality of the state space. We demonstrate that substantial benefits ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • INFORMS Journal on Computing

دوره 19  شماره 

صفحات  -

تاریخ انتشار 2007